10 research outputs found

    The promises of large language models for protein design and modeling

    Get PDF
    The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the “language of proteins” invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design

    The promises of large language models for protein design and modeling.

    Get PDF
    The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the language of proteins invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design

    HEMDAG: a family of modular and scalable hierarchical ensemble methods to improve Gene Ontology term prediction.

    No full text
    MOTIVATION: Automated protein function prediction is a complex multi-class, multi-label, structured classification problem in which protein functions are organized in a controlled vocabulary, according to the Gene Ontology (GO). Hierarchy-unaware classifiers, also known as flat methods, predict GO terms without exploiting the inherent structure of the ontology, potentially violating the True-Path-Rule (TPR) that governs the GO, while hierarchy-aware approaches, even if they obey the TPR, do not always show clear improvements with respect to flat methods, or do not scale well when applied to the full GO. RESULTS: To overcome these limitations, we propose Hierarchical Ensemble Methods for Directed Acyclic Graphs (HEMDAG), a family of highly modular hierarchical ensembles of classifiers, able to build upon any flat method and to provide TPR-safe predictions, by leveraging a combination of isotonic regression and TPR learning strategies. Extensive experiments on synthetic and real data across several organisms firstly show that HEMDAG can be used as a general tool to improve the predictions of flat classifiers, and secondly that HEMDAG is competitive versus state-of-the-art hierarchy-aware learning methods proposed in the last CAFA international challenges. AVAILABILITY: Fully-tested R code freely available at https://anaconda.org/bioconda/r-hemdag. Tutorial and documentation at https://hemdag.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    FZD6 triggers Wnt-signalling driven by WNT10BIVS1 expression and highlights new targets in T cell acute lymphoblastic leukemia

    No full text
    Wnt/Fzd signaling has been implicated in hematopoietic stem cell maintenance and in acute leukemia establishment. In our previous work we described a recurrent rearrangement involving the WNT10B locus (WNT10BR ), characterized by the expression of WNT10BIVS1 transcript variant, in acute myeloid leukemia. To determine the occurrence of WNT10BR in T-cell acute lymphoblastic leukemia (T-ALL), we retrospectively analysed an Italian cohort of patients (n=20) and detected a high incidence (13/20) of WNT10BIVS1 expression. To address genes involved in WNT10B molecular response, we have designed a Wnt targeted RNA sequencing panel. Identifying Wnt agonists and antagonists, it results that the expression of FZD6, LRP5, and PROM1 genes stands out in WNT10BIVS1 positive patients compared to negative ones. Using MOLT4 and MUTZ-2 as leukemic cell models, which are characterized by the expression of WNT10BIVS1 , we have observed that WNT10B drives major Wnt activation to the FZD6 receptor complex through receipt of ligand. Additionally, short hairpin RNAs (shRNAs)-mediated gene silencing and small molecules-mediated inhibition of WNTs secretion, have been observed to interfere with the WNT10B/FZD6 interaction. We have therefore identified that WNT10BIVS1 knockdown, or pharmacological interference by the LGK974 porcupine (PORCN) inhibitor, reduces WNT10B/FZD6 protein complex formation and significantly impairs intracellular effectors and leukemic expansion. These results describe the molecular circuit induced by WNT10B and suggest WNT10B/FZD6 as a new target in the T-ALL treatment strategy. This article is protected by copyright. All rights reserved

    Whole-genome analysis uncovers recurrent IKZF1 inactivation and aberrant cell adhesion in blastic plasmacytoid dendritic cell neoplasm

    No full text
    Blastic plasmacytoid dendritic cell neoplasm (BPDCN) is a rare and highly aggressive hematological malignancy with a poorly understood pathobiology and no effective therapeutic options. Despite a few recurrent genetic defects (eg, single nucleotide changes, indels, large chromosomal aberrations) have been identified in BPDCN, none are disease-specific, and more importantly, none explain its genesis or clinical behavior. In this study, we performed the first high resolution whole-genome analysis of BPDCN with a special focus on structural genomic alterations by using whole-genome sequencing and RNA sequencing. Our study, the first to characterize the landscape of genomic rearrangements and copy number alterations of BPDCN at nucleotide-level resolution, revealed that IKZF1, a gene encoding a transcription factor required for the differentiation of plasmacytoid dendritic cell precursors, is focally inactivated through recurrent structural alterations in this neoplasm. In concordance with the genomic data, transcriptome analysis revealed that conserved IKZF1 target genes display a loss-of-IKZF1 expression pattern. Furthermore, up-regulation of cellular processes responsible for cell-cell and cell-ECM interactions, which is a hallmark of IKZF1 deficiency, was prominent in BPDCN. Our findings suggest that IKZF1 inactivation plays a central role in the pathobiology of the disease, and consequently, therapeutic approaches directed at reestablishing the function of this gene might be beneficial for patients
    corecore